MP-Net: Deep learning-based segmentation for fluorescence microscopy images of microplastics isolated from clams

Environmental monitoring of microplastics (MP) contamination has become an area of great research interest, given potential hazards associated with human ingestion of MP. In this context, determination of MP concentration is essential. However, cheap, rapid, and accurate quantification of MP remains a challenge to this date. This study proposes a deep learning-based image segmentation method that properly distinguishes fluorescent MP from other elements in a given microscopy image. A total of nine different deep learning models, six of which are based on U-Net, were investigated. These models were trained using at least 20,000 patches sampled from 99 fluorescence microscopy images of MP and their corresponding binary masks. MP-Net, which is derived from U-Net, was found to be the best performing model, exhibiting the highest mean F1-score (0.736) and mean IoU value (0.617). Test-time augmentation (using brightness, contrast, and HSV) was applied to MP-Net for robust learning. However, compared to the results obtained without augmentation, no clear improvement in predictive performance could be observed. Recovery assessment for both spiked and real images showed that, compared to already existing tools for MP quantification, the MP quantities predicted by MP-Net are those closest to the ground truth. This observation suggests that MP-Net allows creating masks that more accurately reflect the quantitative presence of fluorescent MP in microscopy images. Finally, MAP (Microplastics Annotation Package) is introduced, an integrated software environment for automated MP quantification, offering support for MP-Net, already existing MP analysis tools like MP-VAT, manual annotation, and model fine-tuning.

As explained in (main text) Figure 1 (a), it is difficult to accurately label all MP when making use of a single threshold. Figure 1 (b) shows an example of an incorrect prediction (Type 2 error), and similar incorrect predictions have also been reported in [1]. Since uncertain labeling may have a significant impact on the effectiveness of TR-based deep learning models, we improved the quality of our labels by incorporating individual pixel annotation (IPA), as shown in Figure 2 Although doing so required a substantial amount of manual effort, it was a necessary step to ensure that our TR-based models are trained with properly annotated images. To avoid bias, three researchers participated in the annotation process. Annotation was completed through Microsoft Paint and Medibang Paint. 1 The final mask was obtained using a majority voting strategy, following the opinion of the majority of the annotators.
(a) Classification of MP according to a particular threshold value T . The x-axis denotes pixel intensity and the y-axis denotes the number of pixels having the corresponding pixel intensity value. Blue indicates the distribution obtained for the background pixels and black indicates the distribution obtained for the MP pixels. Errors introduced by staining and capturing MP create an overlapping area of pixels that are difficult to categorize.
(b) An example of a Type 2 error. The pixels the red arrow points to appear as MP pixels on the fluorescence microscopy image, but are classified as background pixels because they are relatively darker than the other neighbouring MP pixels.

S1.2 Image patch extraction using a sliding window approach and over-sampling
A deep neural network usually comes with a significant number of parameters.
As a result, model training typically requires a substantial amount of data. However, our dataset for training only contains a total of 80 images. If we  would simply slice each image into patches of 256 × 256 pixels, we would be able to obtain 15,983 patches. However, among these 15,983 patches, only 2,123 patches contain at least one piece of MP. Therefore, MP can be identified only in 13.3% of patches. To overcome the lack of available data and to mitigate the imbalance between the number of MP and background pixels, we applied two methods to create a dataset that is more suitable for the purpose of training: a sliding window and over-sampling. First, as shown in Figure 3, overlapping patches were cropped from each image, using a stride of 30 pixels. Since cropping similar patches multiple times may cause overfitting, transfer learning was used, as discussed in Supporting information S2.5. Second, among the sampled patches, we removed patches without MP, thus implementing a form of over-sampling, which is a commonly used method in the field of machine learning to solve the problem of class imbalance [2]. Dataset B was created using this methodology and contains no patches without MP. For Dataset A, on the other hand, 5% of the patches do not have any MP.

S1.3 Under-and overestimation of MP count by the different MP-VAT versions
The MP count in the 99 fluorescence images in Dataset B was estimated using MP-VAT, MP-VAT 2.0, and C-VAT. These predicted MP quantities were compared to what we consider to be the true MP count, derived from the masks following the procedure described in Supporting information S1.1. The error between the predicted counts and the true counts was calculated according to the following formula: At 0% error, as depicted by the red line, the predicted count is equal to the true count. Above and below the red line, the predicted count is more than and less than the true count, respectively. Predictions with an error higher than 300% are not shown for clarity.